{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python使用Pandas将Excel存入MySQL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个典型的数据处理流：\n",
    "1. Pandas从多方数据源读取数据，比如excel、csv、mysql、爬虫等等\n",
    "2. Pandas对数据做过滤、统计分析\n",
    "3. Pandas将数据存储到MySQL，用于Web页面展示、后续的进一步SQL分析等等\n",
    "\n",
    "官网文档：  \n",
    "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数据准备：学生信息Excel表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>学号</th>\n",
       "      <th>姓名</th>\n",
       "      <th>性别</th>\n",
       "      <th>年龄</th>\n",
       "      <th>籍贯</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S001</td>\n",
       "      <td>怠涵</td>\n",
       "      <td>女</td>\n",
       "      <td>23</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S002</td>\n",
       "      <td>婉清</td>\n",
       "      <td>女</td>\n",
       "      <td>25</td>\n",
       "      <td>河南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S003</td>\n",
       "      <td>溪榕</td>\n",
       "      <td>女</td>\n",
       "      <td>23</td>\n",
       "      <td>湖北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S004</td>\n",
       "      <td>漠涓</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>陕西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>S005</td>\n",
       "      <td>祈博</td>\n",
       "      <td>女</td>\n",
       "      <td>24</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     学号  姓名 性别  年龄  籍贯\n",
       "0  S001  怠涵  女  23  山东\n",
       "1  S002  婉清  女  25  河南\n",
       "2  S003  溪榕  女  23  湖北\n",
       "3  S004  漠涓  女  19  陕西\n",
       "4  S005  祈博  女  24  山东"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_excel(\"./course_datas/c23_excel_vlookup/学生信息表.xlsx\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 展示索引的name\n",
    "df.index.name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>学号</th>\n",
       "      <th>姓名</th>\n",
       "      <th>性别</th>\n",
       "      <th>年龄</th>\n",
       "      <th>籍贯</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S001</td>\n",
       "      <td>怠涵</td>\n",
       "      <td>女</td>\n",
       "      <td>23</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S002</td>\n",
       "      <td>婉清</td>\n",
       "      <td>女</td>\n",
       "      <td>25</td>\n",
       "      <td>河南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S003</td>\n",
       "      <td>溪榕</td>\n",
       "      <td>女</td>\n",
       "      <td>23</td>\n",
       "      <td>湖北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S004</td>\n",
       "      <td>漠涓</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>陕西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>S005</td>\n",
       "      <td>祈博</td>\n",
       "      <td>女</td>\n",
       "      <td>24</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      学号  姓名 性别  年龄  籍贯\n",
       "id                     \n",
       "0   S001  怠涵  女  23  山东\n",
       "1   S002  婉清  女  25  河南\n",
       "2   S003  溪榕  女  23  湖北\n",
       "3   S004  漠涓  女  19  陕西\n",
       "4   S005  祈博  女  24  山东"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index.name = \"id\"\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 创建sqlalchemy对象连接MySQL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SQLAlchemy是Python中的ORM框架，\n",
    "Object-Relational Mapping，把关系数据库的表结构映射到对象上。\n",
    "\n",
    "* 官网：https://www.sqlalchemy.org/\n",
    "* 如果sqlalchemy包不存在，用这个命令安装：pip install sqlalchemy\n",
    "* 需要安装依赖Python库：pip install mysql-connector-python\n",
    "\n",
    "可以直接执行SQL语句"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sqlalchemy import create_engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "engine = create_engine(\"mysql+mysqlconnector://root:123456@127.0.0.1:3306/test\", echo=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 方法1：当数据表不存在时，每次覆盖整个表\n",
    "\n",
    "每次运行会drop table，新建表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "df.to_sql(name='student', con=engine, if_exists=\"replace\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'CREATE TABLE `student` (\\n  `id` bigint(20) DEFAULT NULL,\\n  `学号` text,\\n  `姓名` text,\\n  `性别` text,\\n  `年龄` bigint(20) DEFAULT NULL,\\n  `籍贯` text,\\n  KEY `ix_student_id` (`id`)\\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "engine.execute(\"show create table student\").first()[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CREATE TABLE `student` (\n",
      "  `id` bigint(20) DEFAULT NULL,\n",
      "  `学号` text,\n",
      "  `姓名` text,\n",
      "  `性别` text,\n",
      "  `年龄` bigint(20) DEFAULT NULL,\n",
      "  `籍贯` text,\n",
      "  KEY `ix_student_id` (`id`)\n",
      ") ENGINE=InnoDB DEFAULT CHARSET=utf8mb4\n"
     ]
    }
   ],
   "source": [
    "print(engine.execute(\"show create table student\").first()[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(24,)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "engine.execute(\"select count(1) from student\").first()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(0, 'S001', '怠涵', '女', 23, '山东'),\n",
       " (1, 'S002', '婉清', '女', 25, '河南'),\n",
       " (2, 'S003', '溪榕', '女', 23, '湖北'),\n",
       " (3, 'S004', '漠涓', '女', 19, '陕西'),\n",
       " (4, 'S005', '祈博', '女', 24, '山东')]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "engine.execute(\"select * from student limit 5\").fetchall()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 方法2：当数据表存在时，每次新增数据\n",
    "\n",
    "场景：每天会新增一部分数据，要添加到数据表，怎么处理？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>学号</th>\n",
       "      <th>姓名</th>\n",
       "      <th>性别</th>\n",
       "      <th>年龄</th>\n",
       "      <th>籍贯</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S001</td>\n",
       "      <td>怠涵</td>\n",
       "      <td>女</td>\n",
       "      <td>23</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S002</td>\n",
       "      <td>婉清</td>\n",
       "      <td>女</td>\n",
       "      <td>25</td>\n",
       "      <td>河南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S003</td>\n",
       "      <td>溪榕</td>\n",
       "      <td>女</td>\n",
       "      <td>23</td>\n",
       "      <td>湖北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S004</td>\n",
       "      <td>漠涓</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>陕西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>S005</td>\n",
       "      <td>祈博</td>\n",
       "      <td>女</td>\n",
       "      <td>24</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      学号  姓名 性别  年龄  籍贯\n",
       "id                     \n",
       "0   S001  怠涵  女  23  山东\n",
       "1   S002  婉清  女  25  河南\n",
       "2   S003  溪榕  女  23  湖北\n",
       "3   S004  漠涓  女  19  陕西\n",
       "4   S005  祈博  女  24  山东"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_new = df.loc[:4, :]\n",
    "df_new"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_new.to_sql(name='student', con=engine, if_exists=\"append\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(0, 'S001', '怠涵', '女', 23, '山东'),\n",
       " (1, 'S002', '婉清', '女', 25, '河南'),\n",
       " (2, 'S003', '溪榕', '女', 23, '湖北'),\n",
       " (3, 'S004', '漠涓', '女', 19, '陕西'),\n",
       " (4, 'S005', '祈博', '女', 24, '山东'),\n",
       " (0, 'S001', '怠涵', '女', 23, '山东'),\n",
       " (1, 'S002', '婉清', '女', 25, '河南'),\n",
       " (2, 'S003', '溪榕', '女', 23, '湖北'),\n",
       " (3, 'S004', '漠涓', '女', 19, '陕西'),\n",
       " (4, 'S005', '祈博', '女', 24, '山东')]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "engine.execute(\"SELECT * FROM student where id<5 \").fetchall()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 问题解决：先根据数据KEY删除旧数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=5, step=1, name='id')"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_new.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "delete from student where id=0\n",
      "delete from student where id=1\n",
      "delete from student where id=2\n",
      "delete from student where id=3\n",
      "delete from student where id=4\n"
     ]
    }
   ],
   "source": [
    "for id in df_new.index:\n",
    "    ## 先删除要新增的数据\n",
    "    delete_sql = f\"delete from student where id={id}\"\n",
    "    print(delete_sql)\n",
    "    engine.execute(delete_sql)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "engine.execute(\"SELECT * FROM student where id<5 \").fetchall()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(19,)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "engine.execute(\"select count(1) from student\").first()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 新增数据到表中\n",
    "df_new.to_sql(name='student', con=engine, if_exists=\"append\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(0, 'S001', '怠涵', '女', 23, '山东'),\n",
       " (1, 'S002', '婉清', '女', 25, '河南'),\n",
       " (2, 'S003', '溪榕', '女', 23, '湖北'),\n",
       " (3, 'S004', '漠涓', '女', 19, '陕西'),\n",
       " (4, 'S005', '祈博', '女', 24, '山东')]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "engine.execute(\"SELECT * FROM student where id<5 \").fetchall()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(24,)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "engine.execute(\"SELECT count(1) FROM student\").first()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
